Gradient learning in spiking neural networks by dynamic perturbation of 

conductances 
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We present a method of estimating the gradient of an objective function with respect to the 
synaptic weights of a spiking neural network. The method works by measuring the fluctuations 
in the objective function in response to dynamic perturbation of the membrane conductances of 
the neurons. It is compatible with recurrent networks of conductance-based model neurons with 
dynamic synapses. The method can be interpreted as a biologically plausible synaptic learning 
rule, if the dynamic perturbations are generated by a special class of "empiric" synapses driven by 
random spike trains from an external source. 
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Neural network learning is often formulated in terms 
of an objective function that quantifies performance at a 
desired computational task. The network is trained by 
estimating the gradient of the objective function with re- 
spect to synaptic weights, and then changing the weights 
in the direction of the gradient. 

If neural and network dynamics and the objective func- 
tion are all exactly known functions of the weights, such 
learning can be accomplished by explicitly computing the 
relevant gradients. A famous example of this approach, 
used with wide success in non-spiking, deterministic ar- 
tificial neural networks 0, is the backpropagation (BP) 
0,13 algorithm. 

However, the relevance of BP to neurobiological learn- 
ing is limited. Biological neural activity can be noisy, and 
involves the highly nonlinear and often history-dependent 
dynamics of membrane voltages and conductances: neu- 
rons generate voltage spikes, and the efficacy of synaptic 
transmission varies dynamically, on a spike by spike basis 
Further, the objective function in neurobiological 
learning may depend on the dynamics of muscles and 
external variables of the world unknown to the brain. 
Similar complications are also present in analog on-chip 
or robotic implementations of machine learning. 

For learning in such systems, alternative strategies are 
necessary. The method of weight perturbation estimates 
the gradients by perturbing synaptic weights, and ob- 
serving the change in the objective function. Unlike BP, 
weight perturbation is completely "model-free" 0] - it 
does not depend on knowing anything about the func- 
tional dependence of the objective on the network weights 
- and can be applied to stochastic spiking networks 0. 
The disadvantage of a completely model-free approach 
is the tradeoff between generality and learning speed: 
weight perturbation is far more widely applicable than 
BP, but BP is much faster when it is applicable. 

Here we propose a method that is intermediate be- 
tween these two extremes, yet is applicable to arbitrary 



spiking neural networks. Instead of making perturba- 
tions to the synaptic weights, it estimates the N 2 weight 
gradients through dynamic perturbation of the conduc- 
tances of the N network neurons. Our algorithm does 
this by exploiting a feature generic to many models of 
neural networks: that inputs to a neuron combine ad- 
ditively before being subjected to further nonlinearities. 
Otherwise, the algorithm is model-free. Our approach 
generalizes the concept of node perturbation, which has 
been proposed for training feedforward networks of non- 
spiking neurons 00] and can be much faster than weight 
perturbation 0. We show how neural conductance per- 
turbations can be biologically plausibly used to perform 
synaptic gradient learning in fully recurrent networks of 
realistic spiking neurons. 

Spiking neural networks We briefly discuss the math- 
ematical conditions under which our assumption, that 
the synaptic inputs to a single neuron combine linearly, 
holds in spiking neural networks. If each neuron i is elec- 
trotonically compact, it can be described by a transmem- 
brane voltage Vi, obeying the current balance equation 
CidVi/dt = -Ii nt {t) - I- yn (t). The intrinsic current lj nt 
is generally a nonlinear function of voltage and dynami- 
cal variables associated with the spike-generating conduc- 
tances in the membrane. The dynamics of these variables 
may be arbitrarily complex (e.g. Hodgkin-Huxley model) 
without affecting our derivations. A simple model for 
the synaptic current is I- yn = J2j Wtf Sij (t) (Vi (t) — By)- 
The time- varying synaptic conductance from neuron j to 
neuron i is WijSij(t), with amplitude controlled by the 
parameter Wij. Its time course is determined by Sjj(t), 
which could include complex forms of short-term depres- 
sion and facilitation. If the reversal potentials Eij of the 
synapses are all the same, then the synaptic current can 
be written as I- yn = gi(t)(Vi(t) - £^"), where 
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is the sum of all postsynaptic conductances of the 
synapses onto neuron i. The linear dependence of gi(t) on 
the synaptic weights Wij will be critical below. However, 
this linear dependence may be embedded inside a nonlin- 
ear network, which may be arbitrarily complex without 
afffccting the following derivations. In fact, all networks 
- neural and spiking or neither - that depend on a set of 
interaction variables Sij{t) and parameters Wij through 
Eq. (JJJ satisfy the necessary conditions for our derivation 
below. 

Gradient learning We represent the state of the net- 
work by a vector £l(t), which includes the synaptic vari- 
ables Sij(t) and all other dynamical variables (e.g., the 
voltages Vi(t) and all variables ssociated with the mem- 
brane conductances). Starting from an initial condition 
f2(0) the network generates a trajectory from time t = 
to t = T, and in response receives a scalar "reinforce- 
ment" signal R[Q], which is an arbitrary functional of 
the trajectory. For now we assume that the network dy- 
namics are deterministic, and present the fully stochastic 
case in the Appendix. Each trajectory along with its re- 
inforcement is called a "trial," and the learning process 
is iterative, extending over a series of trials. The signal R 
depends implicitly on the synaptic weights Wij , and is an 
objective function for learning. In other words, the goal 
of learning is to find synaptic weights that maximize R. 
A heuristic method for doing this is to follow the gradient 
of R with respect to Wij. Next we derive our gradient 
learning rule. 
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FIG. 1: Neurons in a recurrent network ("actor"), connected 
by modifiable weights W. In addition, each neuron i receives 
an empiric synapse carrying perturbing input from an 

external "experimenter". A global reinforcement signal R is 
broadcast by a "critic" to all neurons in the network. 

Sensitivity lemma Suppose that Wij(t) were a time- 
varying function. Then by Eq. Q and the chain rule, it 
would follow that 
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But if Wij (t) is constrained to take on the same value at 
every time, it follows that 
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We call this the sensitivity lemma, because it relates the 
sensitivity of R to changes in Wij with the sensitivity to 



changes in giit). The implication of the lemma is that 
dynamic perturbations of the variables gi{t) can be used 
to instruct modifications of the static parameters Wij . 
Gradient estimation In order to estimate SR/Sg^t) 
suppose that Eq. is perturbed by a fluctuating white 



noise, 



ft(*)=2Wi i a tf (t)+6(t) 
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The white noise satisfies (£j(t)) = and {£,i(ti)£,j(ti)) — 
a 2 SijS(ti — t>}), where the angle brackets denote a trial 
average. For now, let's regard this perturbation as a 
mathematical device; its biological interpretation will be 
discussed later. 

To show that SR/Sgi(t) can be estimated from the co- 
variance of R and the perturbation £i(t), use the lin- 
ear approximation R — Ro w J Q T dt^ k (8R/ 8gk{t))£,k{t)<, 
which is accurate when the perturbations £*(t) are small. 
Here Rq is defined as R in the absence of any perturba- 
tions, £ = 0. Since the perturbations are white noise, it 
follows that 



((R-RoMt)) 



, SR 

5 9i (t) 
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Because (£) = 0, the baseline Ro may be replaced by 
any quantity that is uncorrelated with the perturbations 
of the current trial. For example, choosing Rq = leaves 
Eq. 10 valid. However, baseline subtraction can have 
a large effect on the variance of the estimate (JjjJ when 
based on a finite number of trials 10] . Thus a good 
choice of baseline can decrease learning time, sometimes 
dramatically. 

If the covariance relation of Eq. is combined with 
the sensitivity lemma Eq. ©, it follows that 



, dR 
dW~ 



dt((R-Ro)£i(t))sij(t). 



(6) 



Synaptic learning rule Equation © suggests the fol- 
lowing stochastic gradient learning procedure. At each 
synapse the purely local eligibility trace 
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is accumulated over the trajectory. At the end of the 
trajectory, the synaptic weight is updated according to: 



A Wij =rj(R- i? )ez. 



(8) 



The update AWij fluctuates because of the random- 
ness in the perturbations. On average, the update 
points in the direction of the gradient, because it satisfies 
(AWij) dR/dWij, according to Eq. ©. This means 
that the learning rule of Eq. JHJ is stochastic gradient 
following. 
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We note one subtlety in the derivation: In Eq. (JJJ the 
synaptic variables Sy(t) are defined in the presence of 
perturbations, while in the sensitivity lemma, they are 
defined for £ = 0. In the linear approximations above, 
this discrepancy leads to a higher-order correction that 
is negligible for small perturbations. 
Biological interpretation According to the above, 
synaptic weight gradients of R can be estimated using 
conductance perturbations £i(t). Could this mathemat- 
ical trick be used by the brain? In the actor-critic ter- 
minology of reinforcement learning 0], one can imag- 
ine that the neurons of one brain area (the "actor") 
drive actions that are assessed by another brain area (the 
"critic"), which in response issues a global, scalar rein- 
forcement signal R to the actor (Fig. 1). A novel feature 
of our rule is that in addition to its regular synapses 
Wij , the actor would receive a special class of "empiric" 
synapses from another hypothesized part of the brain 
(the "experimenter"), which perturb the actor from trial 
to trial. Each plastic synapse locally computes and stores 
its scalar eligibility and multiplies this with R to undergo 
modification. This idea is developed in detail elsewhere 
in a model of birdsong learning , resulting in con- 

crete, nontrivial predictions for synaptic plasticity in the 
brain. 

Note that if the perturbation is a synaptic con- 
ductance, its mean value (&) must be positive. Then the 
linear approximations above are expansions about the 
mean conductance = (&), rather than &(£) = 0. As 
a result, &(t) must be replaced by the zero- mean fluctu- 
ation 5£i(t) — £i(t) — (&) in the eligibility trace. In addi- 
tion, the fluctuations will not be truly white, but 
will have a correlation time set by the time constant of 
the synaptic currents. However, if this correlation time is 
short relative to the time scale of variation in 8R/5gi(t), 
then the gradient estimate Eq. J^J should still be accu- 
rate. 

Accurate gradient estimation requires that the eligi- 
bility trace filter out the mean conductance (&) of the 
empiric synapse. This operation is biologically plausible, 
and can be implemented by a simple time average at ev- 
ery "actor" neuron, if the empiric synapses are driven at 
a constant or very slowly varying rate. 

By contrast, other proposals for stochastic gradient 
learning typically require individual neurons to keep 
track of and filter out a time-varying average vector 
of neural or synaptic activity within each trial, which 
seems rather complex. The added complexity arises be- 
cause these proposals are based on fluctuations in net- 
work dynamics caused by stochasticity intrinsic to neu- 
rons 0,0,0] or synapses Q in the actor network; thus, 
the average perturbation is a function of the network tra- 
jectory and is time-varying. Our algorithm avoids this 
complexity, because the fluctuations are injected by an 
extrinsic source, and are therefore independent of the net- 
work trajectory. Our approach has the additional advan- 



tage that the degree of exploration in the actor can be 
modified independently of activity in the actor. 
Generalization to excitatory and inhibitory 
synapses Above we assumed that all synapses have the 
same reversal potential. But neurons may receive both 
excitatory and inhibitory synapses, which have differ- 
ent reversal potentials. The unmodified learning rule al- 
lows both synapse types to perform gradient following if 
there are two types of empiric synapses per neuron: an 
excitatory empiric synapse used to train the excitatory 
synapses, and an inhibitory empiric synapse used to train 
the inhibitory synapses. But if there is only one empiric 
synapse per neuron, then for both types of synapses to 
perform gradient following, the rule must be modified. 
Let Eij and E^j be the reversal potentials of the regular 
i <— j synapse and of the empiric synapse onto the ith 
actor neuron, respectively. Then we obtain a generalized 
sensitivity lemma: 
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where 



a,ij(t) 



Vi(t) - E k 
Vi(t)-E t 



(9) 



(10) 



is the ratio of the synaptic driving force at the i «— j 
synapse to the driving force of the empiric synapse at 
neuron i. The stochastic gradient learning rule remains 
AWij = r](R — Ro)eij, but with modified eligibility trace 



(m(t) StJ (t), 



For synapses with the same reversal potential as the em- 
piric synapse, Cbij{t) = 1, returning the original learning 
rule. Even for synapses of the opposite variety, the sign 
of djj does not change with time because neural voltage is 
constrained to stay between the inhibitory and excitatory 
reversal potentials Vi and Ve (Vj < Vi{t) < Ve), and 
E^ :i ,Eij G {Vi,Ve}- Nevertheless, for these synapses of 
the opposite variety, the term <iy (t) adds complexity to 
the simple learning rule and reduces its biological plau- 
sibility. 

Generalization to multicompartmental model 
neurons Suppose the model neuron is not isopotential, 
but has several dendritic compartments. Then it can be 
trained without modification of the learning rule by using 
a separate empiric synapse for each compartment. Alter- 
natively, a single empiric synapse could be used for the 
whole neuron, but with the introduction of complexities 
in the learning rule similar to the (t) factor of Eq. i|10fl . 
Technical issues Our synaptic learning rule performs 
stochastic gradient following, and therefore shares the 
virtues and deficiencies of all methods in this class [l7j . 
For example, it is possible to become stuck at a local 
optimum of the objective function. The stochasticity of 
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the gradient estimation may allow some small probability 
of escape, but there is no guarantee of finding a global 
optimum. 

The derivation of our learning rule in particular, and of 
gradient rules in general, depends on the smoothness as- 
sumption that R is a diffcrentiable function of the synap- 
tic weights. But R depends on through the spiking 
activity of the actor network, and spiking neurons typ- 
ically exhibit threshold spike- or no-spike behaviors, so 
one might worry that R is discontinuous. However, be- 
cause either the amplitude or the latency of neural spik- 
ing varies continuously as a function of input near thresh- 
old |18j . there is typically no true discontinuity. 
Comparison with previous work If the perturbation 
£i (t) is Gaussian white noise, then our synaptic learning 
rule can be included as a member of the REINFORCE 
class of algorithms [l^. With Gaussian white noise we 
can use the REINFORCE formalism to prove that our 
learning rule performs stochastic gradient ascent on R 
without assuming that the perturbations are small, be- 
cause linear approximations are not used. In contrast, 
our present derivation does not require the perturba- 
tions to be Gaussian, but assumes they are approximately 
white, and of small amplitude. The REINFORCE theory 
too could be used for non-Gaussian if £i(t) is drawn 
i.i.d. from a smooth probability density function (PDF). 
However, the resulting learning rule will be different than 
ours. Further, the assumption of smoothness of the PDF 
can seriously limit the applicability of the REINFORCE 
theory: for example, a £ generated by filtering a random 
spike train cannot be treated by REINFORCE. 

The sensitivity lemma allows us to derive rules for 
synaptic gradient learning based on perturbations of 
other quantities not directly related to the synaptic pa- 
rameters. Versions of the sensitivity lemma have ap- 
peared in the literature for nonspiking feedforward net- 
works, and been used to estimate the gradient by seri- 
all y p erturbing one neuron at a time (node perturbation) 
[^.Il9|| . Our version of the sensitivity lemma is more gen- 
eral, because it is applicable to learning trajectories in 
recurrent networks, via parallel perturbation of multiple 
neurons. Most importantly, we have shown how to use it 
to derive a biologically plausible rule for gradient learning 
in spiking networks. 
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APPENDIX: Stochastic networks Above the net- 
work dynamics and reinforcement R were assumed to 
be deterministic. Both elements can be made stochas- 
tic, as outlined below. Consider the case of discrete 
time (continuous time is a limiting case). The network 
generates a trajectory O = {O(0), 0(1), . . . , O(T)} from 
a probability density Pw(Cl). Suppose each trajectory 
is generated by drawing an initial condition 0(0) from 
some probability density and then drawing 0(1) through 



O(T) from a Markov process with transition probability 
Fw(0(i)|0(f — 1)). The assumption of Markov transi- 
tion probabilities is compatible with most spiking neural 
network models. The network receives reinforcement R 
from the conditional density P(R\Ct). Since the network 
is parametrized by W, the expected reward 



(R) = J RP(R\n)P w {fl)dRVn 



(11) 



is a function of W. We assume that the transition prob- 
ability depends on the weights W through 

P w (Q(t)\n(t - 1)) = f( 9l (t), . . .,g N (t)) (12) 



where as before 



g i (t)=J2W ij S ij (t-l) 



(13) 



The transition probability depends on all the dynamical 
variables in 0(£), although they have been suppressed for 
notational simplicity in Eq. Ijl2|l . As before, the impor- 
tant mathematical property here is the linearity of Eq. 
(|13fl . which is embedded inside a nonlinear system. The 
sensitivity lemma takes the form: 



d(R) 



E 



d 



d 9i {t) 



(R Sij (t-l)) 



(14) 



The sensitivity lemma shows that the appropriate change 
in the weight of a synapse is not given by the covariancc 
of its activity with reinforcement (as might be naively 
expected), but is instead given by the derivative with re- 
spect to <7i(t) of this covariance. As before, the proof of 
the sensitivity lemma involves comparing derivatives of 
the transition probabilities taken with respect to Wij and 
gi(t), without actually performing either differentiation. 
Note that REINFORCE requires the stronger condition 
that the log probability be differentiable. For small per- 
turbations £j(t), this sensitivity lemma leads us again to 
the gradient learning rule of Eqns. Q7I8[1 . now valid for 
fully stochastic networks. 
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